College dataset

Description
Statistics for a large number of US Colleges from the 1995 issue of US News and World Report.
Dimensions : 777 x 18
Short description of variables (appendix)

Sources
This dataset was taken from the StatLib library which is maintained at Carnegie Mellon University. The dataset was used in the ASA Statistical Graphics Section's 1995 Data Analysis Exposition.

References
This dataset is a part of the course material of the book : Introduction to Statistical Learning with R
(Ch 02 - Statistical Learning - Applied Exercises - Problem 8)


------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Some preliminary workings / imports

In [1]:
In [2]:
In [3]:
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

2.8a - Import data

In [4]:
Out[4]:
True
In [5]:
(777, 19)
Out[5]:
Unnamed: 0 Private Apps Accept Enroll Top10perc Top25perc F.Undergrad P.Undergrad Outstate Room.Board Books Personal PhD Terminal S.F.Ratio perc.alumni Expend Grad.Rate
0 Abilene Christian University Yes 1660 1232 721 23 52 2885 537 7440 3300 450 2200 70 78 18.1 12 7041 60
1 Adelphi University Yes 2186 1924 512 16 29 2683 1227 12280 6450 750 1500 29 30 12.2 16 10527 56
2 Adrian College Yes 1428 1097 336 22 50 1036 99 11250 3750 400 1165 53 66 12.9 30 8735 54
3 Agnes Scott College Yes 417 349 137 60 89 510 63 12960 5450 450 875 92 97 7.7 37 19016 59
4 Alaska Pacific University Yes 193 146 55 16 44 249 869 7560 4120 800 1500 76 72 11.9 2 10922 15
In [6]:
Structure of data
In [7]:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 777 entries, 0 to 776
Data columns (total 19 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   Unnamed: 0   777 non-null    object 
 1   Private      777 non-null    object 
 2   Apps         777 non-null    int64  
 3   Accept       777 non-null    int64  
 4   Enroll       777 non-null    int64  
 5   Top10perc    777 non-null    int64  
 6   Top25perc    777 non-null    int64  
 7   F.Undergrad  777 non-null    int64  
 8   P.Undergrad  777 non-null    int64  
 9   Outstate     777 non-null    int64  
 10  Room.Board   777 non-null    int64  
 11  Books        777 non-null    int64  
 12  Personal     777 non-null    int64  
 13  PhD          777 non-null    int64  
 14  Terminal     777 non-null    int64  
 15  S.F.Ratio    777 non-null    float64
 16  perc.alumni  777 non-null    int64  
 17  Expend       777 non-null    int64  
 18  Grad.Rate    777 non-null    int64  
dtypes: float64(1), int64(16), object(2)
memory usage: 115.5+ KB
Check for missing values
In [8]:
Out[8]:
0
In [9]:
Out[9]:
0
In [10]:
Out[10]:
0
Preliminary observations:
- No missing values.
- Currently, college/universities' names form part of the dataset. They will be added as index and removed from the executable data.
- Categorical variable 'Private' is presently saved as object type. It will be converted to category.
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

2.8b - Data preparation

In [11]:
Private        category
Apps              int64
Accept            int64
Enroll            int64
Top10perc         int64
Top25perc         int64
F.Undergrad       int64
P.Undergrad       int64
Outstate          int64
Room.Board        int64
Books             int64
Personal          int64
PhD               int64
Terminal          int64
S.F.Ratio       float64
perc.alumni       int64
Expend            int64
Grad.Rate         int64
dtype: object
Out[11]:
Private Apps Accept Enroll Top10perc Top25perc F.Undergrad P.Undergrad Outstate Room.Board Books Personal PhD Terminal S.F.Ratio perc.alumni Expend Grad.Rate
College
0 Abilene Christian University Yes 1660 1232 721 23 52 2885 537 7440 3300 450 2200 70 78 18.1 12 7041 60
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

2.8c - Data exploration

2.8c.1 - Summary statistics

In [12]:
Out[12]:
Apps Accept Enroll Top10perc Top25perc F.Undergrad P.Undergrad Outstate Room.Board Books Personal PhD Terminal S.F.Ratio perc.alumni Expend Grad.Rate
count 777.000000 777.000000 777.000000 777.000000 777.000000 777.000000 777.000000 777.000000 777.000000 777.000000 777.000000 777.000000 777.000000 777.000000 777.000000 777.000000 777.00000
mean 3001.638353 2018.804376 779.972973 27.558559 55.796654 3699.907336 855.298584 10440.669241 4357.526384 549.380952 1340.642214 72.660232 79.702703 14.089704 22.743887 9660.171171 65.46332
std 3870.201484 2451.113971 929.176190 17.640364 19.804778 4850.420531 1522.431887 4023.016484 1096.696416 165.105360 677.071454 16.328155 14.722359 3.958349 12.391801 5221.768440 17.17771
min 81.000000 72.000000 35.000000 1.000000 9.000000 139.000000 1.000000 2340.000000 1780.000000 96.000000 250.000000 8.000000 24.000000 2.500000 0.000000 3186.000000 10.00000
5% 329.800000 272.400000 118.600000 7.000000 25.800000 509.800000 20.000000 4601.600000 2735.800000 350.000000 500.000000 43.800000 52.800000 8.300000 6.000000 4795.800000 37.00000
25% 776.000000 604.000000 242.000000 15.000000 41.000000 992.000000 95.000000 7320.000000 3597.000000 470.000000 850.000000 62.000000 71.000000 11.500000 13.000000 6751.000000 53.00000
50% 1558.000000 1110.000000 434.000000 23.000000 54.000000 1707.000000 353.000000 9990.000000 4200.000000 500.000000 1200.000000 75.000000 82.000000 13.600000 21.000000 8377.000000 65.00000
75% 3624.000000 2424.000000 902.000000 35.000000 69.000000 4005.000000 967.000000 12925.000000 5050.000000 600.000000 1700.000000 85.000000 92.000000 16.500000 31.000000 10830.000000 78.00000
95% 11066.200000 6979.200000 2757.000000 65.200000 93.000000 14477.800000 3303.600000 18498.000000 6382.000000 765.600000 2488.800000 95.000000 98.000000 21.000000 46.000000 17974.800000 94.20000
max 48094.000000 26330.000000 6392.000000 96.000000 100.000000 31643.000000 21836.000000 21700.000000 8124.000000 2340.000000 6800.000000 103.000000 100.000000 39.800000 64.000000 56233.000000 118.00000
In [13]:
Out[13]:
Private
count 777
unique 2
top Yes
freq 565
In [14]:
Private
---------------
Yes    565
No     212
Name: Private, dtype: int64
------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

2.8c.2 - Scatterplot matrix

In [15]:
['Private', 'Apps', 'Accept', 'Enroll', 'Top10perc', 'Top25perc', 'Outstate', 'Room.Board', 'Personal', 'PhD', 'Terminal', 'Expend', 'Grad.Rate']
Out[15]:
13
In [16]:
Tentative observations:
- Private colleges have a much higher 'Expend' (instructional expenditure per student)
- There is moderately positive relationship bet the colleges preferred by the Top10perc and the Outstate tuition charged
'#####################################################################
Correlation
In [17]:
Out[17]:
Apps Accept Enroll Top10perc Top25perc F.Undergrad P.Undergrad Outstate Room.Board Books Personal PhD Terminal S.F.Ratio perc.alumni Expend Grad.Rate
Apps 1 0.943 0.847 - - 0.814 - - - - - - - - - - -
Accept 0.943 1 0.912 - - 0.874 - - - - - - - - - - -
Enroll 0.847 0.912 1 - - 0.965 - - - - - - - - - - -
Top10perc - - - 1 0.892 - - 0.562 - - - - - - - 0.661 -
Top25perc - - - 0.892 1 - - - - - - - - - - - -
F.Undergrad 0.814 0.874 0.965 - - 1 0.571 - - - - - - - - - -
P.Undergrad - - - - - 0.571 1 - - - - - - - - - -
Outstate - - - 0.562 - - - 1 0.654 - - - - -0.555 0.566 0.673 0.571
Room.Board - - - - - - - 0.654 1 - - - - - - - -
Books - - - - - - - - - 1 - - - - - - -
Personal - - - - - - - - - - 1 - - - - - -
PhD - - - - - - - - - - - 1 0.85 - - - -
Terminal - - - - - - - - - - - 0.85 1 - - - -
S.F.Ratio - - - - - - - -0.555 - - - - - 1 - -0.584 -
perc.alumni - - - - - - - 0.566 - - - - - - 1 - -
Expend - - - 0.661 - - - 0.673 - - - - - -0.584 - 1 -
Grad.Rate - - - - - - - 0.571 - - - - - - - - 1
#'#####################################################################
Correlation plot
In [18]:
In [19]:
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

2.8c.3 - Boxplot

In [20]:
Observations:
- Out-of-state tuition charged by the private colleges is higher for private institutions.
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

2.8c.4 - Elite

Elite >> universities with Top10perc > 50%

In [21]:
In [22]:
category
Out[22]:
Private Apps Accept Enroll Top10perc Top25perc F.Undergrad P.Undergrad Outstate Room.Board Books Personal PhD Terminal S.F.Ratio perc.alumni Expend Grad.Rate Elite
College
506 Salem-Teikyo University Yes 489 384 120 23 52 700 45 10575 3952 400 620 46 24 13.0 9 8946 98 No
436 Oklahoma State University No 4522 3913 2181 29 57 12830 1658 5336 3344 800 3100 84 92 15.3 14 6433 48 No
754 Wheeling Jesuit College Yes 903 755 213 15 49 971 305 10500 4545 600 600 66 71 14.1 27 7494 72 No
In [23]:
Out[23]:
No     699
Yes     78
Name: Elite, dtype: int64
In [24]:
Out[24]:
Private No Yes
Elite
No 199 500
Yes 13 65
In [25]:
Observations:
- 83% (65 of 78) of the 'Elite' institutions are private.
- The distribution of Outstate tuition in Elite universities is heavily right-skewed indicating that most of the 'Elite' institutions charge high out-of-state tuition.
- The median Outstate tuition in Elite institutions is much higher than in Non-elite instituition, pointing to a clear difference between the educational accessibility for out-of-state students.
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

2.8c.5 - Histograms

Freedman-Diaconis method has been used for calculating bin-widths.

In [26]:
In [27]:
Observations:
- Out-of-state tuition and Room.Board expenses are slightly positively skewed.
- Expenditure on 'Books', 'Personal' expenses of students and 'Expend' (instructional exp per student) are positively skewed.
- Median Total Expenditure with Outstate tuition is $16,079.
 Median household income in the same year (1995) as per the US Census was ≈ $34,000.
In [28]:
In [29]:
Out[29]:
Outstate Room.Board Books Personal TExp.without.O TExp.with.O Expend
skew 0.509278 0.477356 3.485025 1.742497 0.567391 0.456621 3.459322
kurt -0.413832 -0.187553 28.333097 7.124017 0.636845 -0.370425 18.771500
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
In [30]:
Observations:
- PhD and Terminal are hevily left-skewed, i.e. most of the faculty is highly specialised in their respective disciplines.
  PhD has one bin > 100. This could be a mistake.
- Not many colleges have student-faculty ratio > 20.
- There is wide fluctuation in Graduation rate with 17.63% of institutions having graduation rates below 50%.
- IQR (Q3-Q1) of alumnis who donate ranges from 13% to 31%.

Note: see workings below for calculations

'#################### workings

PhD > 100
In [31]:
Out[31]:
[(582, 'Texas A&M University at Galveston')]
Graduation rate
In [32]:
Out[32]:
count mean std min 25% 50% 75% max
Grad.Rate 777.0 65.46332 17.17771 10.0 53.0 65.0 78.0 118.0
In [33]:
Out[33]:
137
In [34]:
Out[34]:
False    82.368082
True     17.631918
Name: Grad.Rate, dtype: float64
Donor alumnis
In [35]:
Out[35]:
count mean std min 25% 50% 75% max
perc.alumni 777.0 22.743887 12.391801 0.0 13.0 21.0 31.0 64.0

'#################### workings'

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

2.8c.6 - Further data exploration

a) Spending patterns - private vs non-private
In [36]:
Out[36]:
Count Proportion Median
Private
No 212 0.272844 1649
Yes 565 0.727156 1100
In [37]:
Out[37]:
Private
No     106
Yes    103
Name: Personal, dtype: int64
In [38]:
In [39]:
Observations:
- Distribution in private is highly positively skewed while distribution in non-private is moderately positively skewed.
- Median personal spending by students in non-private ($1649) is higher than in private ($1100).
- The number of institutions where Personal spending is > $1649 (median(non-private)) is almost similar for both private (103) and non-private (106).
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
b) Most sought after college/university

Institutions with high

  • Top10perc
  • Top25perc
  • Apps
  • No. of Apps per Enroll
  • No. of Enroll per Accept
In [40]:
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 777 entries, (0, 'Abilene Christian University') to (776, 'York College of Pennsylvania')
Data columns (total 9 columns):
 #   Column       Non-Null Count  Dtype
---  ------       --------------  -----
 0   Apps         777 non-null    int64
 1   Accept       777 non-null    int64
 2   Enroll       777 non-null    int64
 3   Top10perc    777 non-null    int64
 4   Top25perc    777 non-null    int64
 5   F.Undergrad  777 non-null    int64
 6   apr          777 non-null    int32
 7   acr          777 non-null    int32
 8   enr          777 non-null    int32
dtypes: int32(3), int64(6)
memory usage: 100.8+ KB
In [41]:
(100, 9)
Out[41]:
Apps Accept Enroll Top10perc Top25perc F.Undergrad apr acr enr
College
354 Massachusetts Institute of Technology 6411 2140 1078 96 99 4481 595 33 50
251 Harvey Mudd College 1377 572 178 95 100 654 774 42 31
605 University of California at Berkeley 19873 8252 3215 95 100 19532 618 42 39
775 Yale University 10705 2453 1317 95 99 5217 813 23 54
174 Duke University 13789 3893 1583 90 98 6188 871 28 41
250 Harvard University 13865 2165 1606 90 100 6862 863 16 74
459 Princeton University 13218 2042 1153 90 98 4540 1146 15 56
222 Georgia Institute of Technology 7837 4527 2276 89 99 8528 344 58 50
70 Brown University 12586 3239 1462 87 95 5643 861 26 45
158 Dartmouth College 8587 2273 1087 87 99 3918 790 26 48
In [42]:
Out[42]:
Apps Accept Enroll Top10perc Top25perc F.Undergrad apr acr enr
College
251 Harvey Mudd College 1377 572 178 95 100 654 774 42 31
605 University of California at Berkeley 19873 8252 3215 95 100 19532 618 42 39
250 Harvard University 13865 2165 1606 90 100 6862 863 16 74
606 University of California at Irvine 15698 10775 2478 85 100 12677 633 69 23
663 University of Pennsylvania 12394 5232 2464 85 100 9205 503 42 47
60 Bowdoin College 3356 1019 418 76 100 1490 803 30 41
562 SUNY at Buffalo 15039 9649 3087 36 100 13963 487 64 32
354 Massachusetts Institute of Technology 6411 2140 1078 96 99 4481 595 33 50
775 Yale University 10705 2453 1317 95 99 5217 813 23 54
222 Georgia Institute of Technology 7837 4527 2276 89 99 8528 344 58 50
In [43]:
Out[43]:
Apps Accept Enroll Top10perc Top25perc F.Undergrad apr acr enr
College
483 Rutgers at New Brunswick 48094 26330 4520 36 79 21401 1064 55 17
461 Purdue University at West Lafayette 21804 18744 5874 29 60 26213 371 86 31
59 Boston University 20192 13007 3810 45 80 14971 530 64 29
605 University of California at Berkeley 19873 8252 3215 95 100 19532 618 42 39
445 Pennsylvania State Univ. Main Campus 19315 10344 3450 48 93 28938 560 54 33
In [44]:
Out[44]:
Apps Accept Enroll Top10perc Top25perc F.Undergrad apr acr enr
College
484 Rutgers State University at Camden 3366 1752 232 27 79 2585 1451 52 13
578 Talladega College 4414 1500 335 30 60 908 1318 34 22
570 SUNY College at New Paltz 8399 3609 656 19 53 4658 1280 43 18
210 Franklin Pierce College 5187 4471 446 3 14 1818 1163 86 10
485 Rutgers State University at Newark 5785 2690 499 26 62 4005 1159 46 19
In [45]:
Out[45]:
Apps Accept Enroll Top10perc Top25perc F.Undergrad apr acr enr
College
77 California Lutheran University 563 247 247 23 52 1427 228 44 100
66 Brewton-Parker College 1436 1228 1202 10 26 1320 119 86 98
376 Mississippi University for Women 480 405 380 19 46 1673 126 84 94
447 Peru State College 701 501 458 10 40 959 153 71 91
275 Indiana Wesleyan University 735 423 366 20 48 2448 201 58 87
In [46]:
Out[46]:
16
Most Sought-after Colleges/Universities (Final list)
In [47]:
Out[47]:
Apps Accept Enroll Top10perc Top25perc F.Undergrad apr acr enr
College
354 Massachusetts Institute of Technology 6411 2140 1078 96 99 4481 595 33 50
775 Yale University 10705 2453 1317 95 99 5217 813 23 54
250 Harvard University 13865 2165 1606 90 100 6862 863 16 74
459 Princeton University 13218 2042 1153 90 98 4540 1146 15 56
70 Brown University 12586 3239 1462 87 95 5643 861 26 45
158 Dartmouth College 8587 2273 1087 87 99 3918 790 26 48
663 University of Pennsylvania 12394 5232 2464 85 100 9205 503 42 47
763 Williams College 4186 1245 526 81 96 1988 796 30 42
733 Wellesley College 2895 1249 579 80 96 2195 500 43 46
660 University of Notre Dame 7700 3700 1906 79 96 7671 404 48 52
144 Columbia University 6756 1930 871 78 96 3376 776 29 45
159 Davidson College 2373 956 452 77 96 1601 525 40 47
651 University of North Carolina at Chapel Hill 14596 5985 3331 75 92 14609 438 41 56
693 University of Virginia 15849 5384 2678 74 95 11278 592 34 50
221 Georgetown University 11115 2881 1390 71 93 5881 800 26 48
238 Grove City College 2491 1110 573 57 88 2213 435 45 52
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
c) Further analysis of Most sought-after colleges/univeristies
In [48]:
Out[48]:
Yes    14
No      2
Name: Private, dtype: int64
In [49]:
In [50]:
Out[50]:
Apps Accept Enroll Top10perc Top25perc F.Undergrad P.Undergrad Outstate Room.Board Books Personal PhD Terminal S.F.Ratio perc.alumni Expend Grad.Rate TExp.without.O TExp.with.O
count 777 777 777 777 777 777 777 777 777 777 777 777 777 777 777 777 777 777 777
mean 3001 2018 779 27 55 3699 855 10440 4357 549 1340 72 79 14 22 9660 65 6247 16688
std 3870 2451 929 17 19 4850 1522 4023 1096 165 677 16 14 3 12 5221 17 1216 4675
min 81 72 35 1 9 139 1 2340 1780 96 250 8 24 2 0 3186 10 3452 6604
5% 329 272 118 7 25 509 20 4601 2735 350 500 43 52 8 6 4795 37 4450 9846
10% 457 361 154 10 30 641 35 5568 3051 400 600 50 59 9 8 5558 44 4809 11006
25% 776 604 242 15 41 992 95 7320 3597 470 850 62 71 11 13 6751 53 5400 13279
50% 1558 1110 434 23 54 1707 353 9990 4200 500 1200 75 82 13 21 8377 65 6100 16079
75% 3624 2424 902 35 69 4005 967 12925 5050 600 1700 85 92 16 31 10830 78 6958 19650
90% 7674 4814 1903 50 85 10024 2016 16552 5950 700 2200 92 96 19 40 14841 89 7922 23430
95% 11066 6979 2756 65 93 14477 3303 18497 6381 765 2488 95 98 21 46 17974 94 8392 26029
max 48094 26330 6392 96 100 31643 21836 21700 8124 2340 6800 103 100 39 64 56233 118 12330 29095
Out[50]:
Private Apps Accept Enroll Top10perc Top25perc F.Undergrad P.Undergrad Outstate Room.Board Books Personal PhD Terminal S.F.Ratio perc.alumni Expend Grad.Rate Elite TExp.without.O TExp.with.O
College
354 Massachusetts Institute of Technology Yes 6411 2140 1078 96 99 4481 28 20100 5975 725 1600 99 99 10.1 35 33541 94 Yes 8300 28400
775 Yale University Yes 10705 2453 1317 95 99 5217 83 19840 6510 630 2115 96 96 5.8 49 40386 99 Yes 9255 29095
250 Harvard University Yes 13865 2165 1606 90 100 6862 320 18485 6410 500 1920 97 97 9.9 52 37219 100 Yes 8830 27315
459 Princeton University Yes 13218 2042 1153 90 98 4540 146 19900 5910 675 1575 91 96 8.4 54 28320 99 Yes 8160 28060
70 Brown University Yes 12586 3239 1462 87 95 5643 349 19528 5926 720 1100 99 100 7.6 39 20440 97 Yes 7746 27274
158 Dartmouth College Yes 8587 2273 1087 87 99 3918 32 19545 6070 550 1100 95 99 4.7 49 29619 98 Yes 7720 27265
663 University of Pennsylvania Yes 12394 5232 2464 85 100 9205 531 17020 7270 500 1544 95 96 6.3 38 25765 93 Yes 9314 26334
763 Williams College Yes 4186 1245 526 81 96 1988 29 19629 5790 500 1200 94 99 9.0 64 22014 99 Yes 7490 27119
733 Wellesley College Yes 2895 1249 579 80 96 2195 156 18345 5995 500 700 94 98 10.6 51 21409 91 Yes 7195 25540
660 University of Notre Dame Yes 7700 3700 1906 79 96 7671 30 16850 4400 600 1350 96 92 13.1 46 13936 97 Yes 6350 23200
144 Columbia University Yes 6756 1930 871 78 96 3376 55 18624 6664 550 300 97 98 5.9 21 30639 99 Yes 7514 26138
159 Davidson College Yes 2373 956 452 77 96 1601 6 17295 5070 600 1011 95 97 12.0 46 17581 94 Yes 6681 23976
651 University of North Carolina at Chapel Hill No 14596 5985 3331 75 92 14609 1100 8400 4200 550 1200 88 93 8.9 23 15893 83 Yes 5950 14350
693 University of Virginia No 15849 5384 2678 74 95 11278 114 12212 3792 500 1000 90 92 9.5 22 13597 95 Yes 5292 17504
221 Georgetown University Yes 11115 2881 1390 71 93 5881 406 18300 7131 670 1700 91 92 7.2 27 19635 95 Yes 9501 27801
238 Grove City College Yes 2491 1110 573 57 88 2213 35 5224 3048 525 350 65 65 18.4 18 4957 100 Yes 3923 9147
Observations:
- 14 of the 16 (87.5%) MSAs (most sought-after institutions) are private.
- 15 (93.75%) have Grad.Rate > 90th percentile with 1 having ≈ 85th percentile with 83% grad rate.
- Outstate tuition for 13 (81.25%) MSAs is among the top 90th percentile.
#'################ workings
Percentile table (single variable)
In [51]:
< 90 :  1 (6.25)
>= 90 :  15 (93.75)
Out[51]:
College Grad.Rate Percentile Percentile_Main
2 Harvard University 100 99.87 99.49
15 Grove City College 100 99.87 93.56
1 Yale University 99 98.58 99.87
3 Princeton University 99 98.58 99.49
7 Williams College 99 98.58 98.07
10 Columbia University 99 98.58 97.68
5 Dartmouth College 98 97.94 98.97
4 Brown University 97 97.3 98.97
9 University of Notre Dame 97 97.3 97.81
13 University of Virginia 95 95.62 96.65
14 Georgetown University 95 95.62 96.53
0 Massachusetts Institute of Technology 94 94.98 100
11 Davidson College 94 94.98 97.55
6 University of Pennsylvania 93 94.47 98.58
8 Wellesley College 91 93.05 97.94
12 University of North Carolina at Chapel Hill 83 84.81 97.17
In [52]:
In [53]:
<ipython-input-53-0d67190b2fe9>:5: UserWarning: p-value floored: true value smaller than 0.001
  anderson_ksamp([base_df[var], subset_df[var]], midrank=True)
Out[53]:
Anderson_ksampResult(statistic=39.535210262955914, critical_values=array([0.325, 1.226, 1.961, 2.718, 3.752, 4.592, 6.546]), significance_level=0.001)
Out[53]:
KstestResult(statistic=0.8551319176319176, pvalue=1.9251267247000214e-13)
In [54]:
'#####################################################################
Boxplots
In [55]:
In [56]:
#'#####################################################################
Statistical test
In [57]:
Out[57]:
KS (p-value) AD (min sig lvl) Significant
Apps 0.00000 0.00100 Y
Accept 0.00307 0.00256 Y
Enroll 0.00010 0.00100 Y
Top10perc 0.00000 0.00100 Y
Top25perc 0.00000 0.00100 Y
F.Undergrad 0.00026 0.00100 Y
P.Undergrad 0.03205 0.00541 Y
Outstate 0.00000 0.00100 Y
Room.Board 0.00004 0.00100 Y
Books 0.15973 0.06639 N
Personal 0.86654 0.25000 N
PhD 0.00000 0.00100 Y
Terminal 0.00000 0.00100 Y
S.F.Ratio 0.00000 0.00100 Y
perc.alumni 0.00023 0.00100 Y
Expend 0.00000 0.00100 Y
Grad.Rate 0.00000 0.00100 Y
TExp.without.O 0.00095 0.00100 Y
TExp.with.O 0.00000 0.00100 Y
'################ workings'
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
d) Top 20 colleges by applications
In [58]:
In [59]:
Out[59]:
Apps Accept Enroll Top10perc Top25perc F.Undergrad P.Undergrad Outstate Room.Board Books Personal PhD Terminal S.F.Ratio perc.alumni Expend Grad.Rate TExp.without.O TExp.with.O
count 777 777 777 777 777 777 777 777 777 777 777 777 777 777 777 777 777 777 777
mean 3001 2018 779 27 55 3699 855 10440 4357 549 1340 72 79 14 22 9660 65 6247 16688
std 3870 2451 929 17 19 4850 1522 4023 1096 165 677 16 14 3 12 5221 17 1216 4675
min 81 72 35 1 9 139 1 2340 1780 96 250 8 24 2 0 3186 10 3452 6604
5% 329 272 118 7 25 509 20 4601 2735 350 500 43 52 8 6 4795 37 4450 9846
10% 457 361 154 10 30 641 35 5568 3051 400 600 50 59 9 8 5558 44 4809 11006
25% 776 604 242 15 41 992 95 7320 3597 470 850 62 71 11 13 6751 53 5400 13279
50% 1558 1110 434 23 54 1707 353 9990 4200 500 1200 75 82 13 21 8377 65 6100 16079
75% 3624 2424 902 35 69 4005 967 12925 5050 600 1700 85 92 16 31 10830 78 6958 19650
90% 7674 4814 1903 50 85 10024 2016 16552 5950 700 2200 92 96 19 40 14841 89 7922 23430
95% 11066 6979 2756 65 93 14477 3303 18497 6381 765 2488 95 98 21 46 17974 94 8392 26029
max 48094 26330 6392 96 100 31643 21836 21700 8124 2340 6800 103 100 39 64 56233 118 12330 29095
Out[59]:
Private Apps Accept Enroll Top10perc Top25perc F.Undergrad P.Undergrad Outstate Room.Board Books Personal PhD Terminal S.F.Ratio perc.alumni Expend Grad.Rate Elite TExp.without.O TExp.with.O apr acr enr
College
Rutgers at New Brunswick No 48094 26330 4520 36 79 21401 3712 7410 4748 690 2009 90 95 19.5 19 10474 77 No 7447 14857 1064 55 17
Purdue University at West Lafayette No 21804 18744 5874 29 60 26213 4065 9556 3990 570 1060 86 86 18.2 15 8604 67 No 5620 15176 371 86 31
Boston University Yes 20192 13007 3810 45 80 14971 3113 18420 6810 475 1025 80 81 11.9 16 16836 72 No 8310 26730 530 64 29
University of California at Berkeley No 19873 8252 3215 95 100 19532 2061 11648 6246 636 1933 93 97 15.8 10 13919 78 Yes 8815 20463 618 42 39
Pennsylvania State Univ. Main Campus No 19315 10344 3450 48 93 28938 2025 10645 4060 512 2394 77 96 18.1 19 8992 63 No 6966 17611 560 54 33
University of Michigan at Ann Arbor No 19152 12940 4893 66 92 22045 1339 15732 4659 476 1600 90 98 11.5 26 14847 87 Yes 6735 22467 391 68 38
Michigan State University No 18114 15096 6180 23 57 26640 4120 10658 3734 504 600 93 95 14.0 9 10520 71 No 4838 15496 293 83 41
Indiana University at Bloomington No 16587 13243 5873 25 72 24763 2717 9766 3990 600 2000 77 88 21.3 24 8686 68 No 6590 16356 282 80 44
University of Virginia No 15849 5384 2678 74 95 11278 114 12212 3792 500 1000 90 92 9.5 22 13597 95 Yes 5292 17504 592 34 50
Virginia Tech No 15712 11719 4277 29 53 18511 604 10260 3176 740 2200 85 89 13.8 20 8944 73 No 6116 16376 367 75 36
University of California at Irvine No 15698 10775 2478 85 100 12677 864 12024 5302 790 1818 96 96 16.1 11 15934 66 Yes 7910 19934 633 69 23
SUNY at Buffalo No 15039 9649 3087 36 100 13963 3124 6550 4731 708 957 90 97 13.6 15 11177 56 No 6396 12946 487 64 32
University of Illinois - Urbana No 14939 11652 5705 52 88 25422 911 7560 4574 500 1982 87 90 17.4 13 8559 81 Yes 7056 14616 262 78 49
University of Wisconsin at Madison No 14901 10932 4631 36 80 23945 2200 9096 4290 535 1545 93 96 11.5 20 11006 72 No 6370 15466 322 73 42
University of Texas at Austin No 14752 9572 5329 48 85 30017 5189 5130 3309 650 3140 91 99 19.7 11 7837 65 No 7099 12229 277 65 56
University of North Carolina at Chapel Hill No 14596 5985 3331 75 92 14609 1100 8400 4200 550 1200 88 93 8.9 23 15893 83 Yes 5950 14350 438 41 56
Texas A&M Univ. at College Station No 14474 10519 6392 49 85 31643 2798 5130 3412 600 2144 89 91 23.1 29 8471 69 No 6156 11286 226 73 61
SUNY at Binghamton No 14463 6166 1757 60 94 8544 671 6550 4598 700 1000 83 100 18.0 15 8055 80 Yes 6298 12848 823 43 28
University of Delaware Yes 14446 10516 3252 22 57 14130 4522 10220 4230 530 1300 82 87 18.3 15 10650 75 No 6060 16280 444 73 31
University of Massachusetts at Amherst No 14438 12414 3816 12 39 16282 1940 8566 3897 500 1400 88 92 16.7 15 10276 68 No 5797 14363 378 86 31
Observations:
- 18 of the top 20 Apps colleges are non-private.
- Institutions with high applications (HAIs) also have high acceptance.
- High applications also accompany high enrollment numbers but the enrollment rates distribution for HAIs is not very different from the overall enrollment rates distribution.
This suggests that although applications are high, not many go on to enroll. Many applications could be backup applications.
- HAIs have a statistically significant distribution (compared to the overall sample) for all variables except Outstate, Room.Board, Books, Personal.

Note: See workings below.
#'################ workings
Percentile table (single variable)
In [60]:
< 90 :  0 (0.0)
>= 90 :  20 (100.0)
Out[60]:
College Accept Percentile Percentile_Main
0 Rutgers at New Brunswick 26330 100 100
1 Purdue University at West Lafayette 18744 99.87 99.87
6 Michigan State University 15096 99.74 99.23
7 Indiana University at Bloomington 13243 99.61 99.1
2 Boston University 13007 99.49 99.74
5 University of Michigan at Ann Arbor 12940 99.36 99.36
19 University of Massachusetts at Amherst 12414 99.23 97.55
9 Virginia Tech 11719 99.1 98.84
12 University of Illinois - Urbana 11652 98.97 98.46
13 University of Wisconsin at Madison 10932 98.84 98.33
10 University of California at Irvine 10775 98.71 98.71
16 Texas A&M Univ. at College Station 10519 98.58 97.94
18 University of Delaware 10516 98.46 97.68
4 Pennsylvania State Univ. Main Campus 10344 98.33 99.49
11 SUNY at Buffalo 9649 97.94 98.58
14 University of Texas at Austin 9572 97.81 98.2
3 University of California at Berkeley 8252 96.91 99.61
17 SUNY at Binghamton 6166 93.95 97.81
15 University of North Carolina at Chapel Hill 5985 93.56 98.07
8 University of Virginia 5384 92.02 98.97
In [61]:
In [62]:
<ipython-input-62-0d67190b2fe9>:5: UserWarning: p-value floored: true value smaller than 0.001
  anderson_ksamp([base_df[var], subset_df[var]], midrank=True)
Out[62]:
Anderson_ksampResult(statistic=60.848727424238014, critical_values=array([0.325, 1.226, 1.961, 2.718, 3.752, 4.592, 6.546]), significance_level=0.001)
Out[62]:
KstestResult(statistic=0.918918918918919, pvalue=1.1102230246251565e-16)
In [63]:
#'#####################################################################
Boxplots
In [64]:
#'#####################################################################
Statistical test
In [65]:
Out[65]:
KS (p-value) AD (min sig lvl) Significant
Apps 0.00000 0.00100 Y
Accept 0.00000 0.00100 Y
Enroll 0.00000 0.00100 Y
Top10perc 0.00038 0.00100 Y
Top25perc 0.00000 0.00100 Y
F.Undergrad 0.00000 0.00100 Y
P.Undergrad 0.00000 0.00100 Y
Outstate 0.39811 0.25000 N
Room.Board 0.55443 0.25000 N
Books 0.13598 0.03599 -
Personal 0.09726 0.03611 -
PhD 0.00001 0.00100 Y
Terminal 0.00002 0.00100 Y
S.F.Ratio 0.04855 0.04648 Y
perc.alumni 0.04120 0.02025 Y
Expend 0.00159 0.00222 Y
Grad.Rate 0.00549 0.00890 Y
TExp.without.O 0.14501 0.17326 N
TExp.with.O 0.49792 0.25000 N
#'################ workings'
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
e) Further analysis of Elite colleges
In [66]:
In [67]:
Out[67]:
(78, 21)
Out[67]:
Apps Accept Enroll Top10perc Top25perc F.Undergrad P.Undergrad Outstate Room.Board Books Personal PhD Terminal S.F.Ratio perc.alumni Expend Grad.Rate TExp.without.O TExp.with.O
count 777 777 777 777 777 777 777 777 777 777 777 777 777 777 777 777 777 777 777
mean 3001 2018 779 27 55 3699 855 10440 4357 549 1340 72 79 14 22 9660 65 6247 16688
std 3870 2451 929 17 19 4850 1522 4023 1096 165 677 16 14 3 12 5221 17 1216 4675
min 81 72 35 1 9 139 1 2340 1780 96 250 8 24 2 0 3186 10 3452 6604
5% 329 272 118 7 25 509 20 4601 2735 350 500 43 52 8 6 4795 37 4450 9846
10% 457 361 154 10 30 641 35 5568 3051 400 600 50 59 9 8 5558 44 4809 11006
25% 776 604 242 15 41 992 95 7320 3597 470 850 62 71 11 13 6751 53 5400 13279
50% 1558 1110 434 23 54 1707 353 9990 4200 500 1200 75 82 13 21 8377 65 6100 16079
75% 3624 2424 902 35 69 4005 967 12925 5050 600 1700 85 92 16 31 10830 78 6958 19650
90% 7674 4814 1903 50 85 10024 2016 16552 5950 700 2200 92 96 19 40 14841 89 7922 23430
95% 11066 6979 2756 65 93 14477 3303 18497 6381 765 2488 95 98 21 46 17974 94 8392 26029
max 48094 26330 6392 96 100 31643 21836 21700 8124 2340 6800 103 100 39 64 56233 118 12330 29095
Out[67]:
Private Apps Accept Enroll Top10perc Top25perc F.Undergrad P.Undergrad Outstate Room.Board Books Personal PhD Terminal S.F.Ratio perc.alumni Expend Grad.Rate Elite TExp.without.O TExp.with.O
SN College
0 Massachusetts Institute of Technology Yes 6411 2140 1078 96 99 4481 28 20100 5975 725 1600 99 99 10.1 35 33541 94 Yes 8300 28400
1 Harvey Mudd College Yes 1377 572 178 95 100 654 5 17230 6690 700 900 100 100 8.2 46 21569 100 Yes 8290 25520
2 University of California at Berkeley No 19873 8252 3215 95 100 19532 2061 11648 6246 636 1933 93 97 15.8 10 13919 78 Yes 8815 20463
3 Yale University Yes 10705 2453 1317 95 99 5217 83 19840 6510 630 2115 96 96 5.8 49 40386 99 Yes 9255 29095
4 Harvard University Yes 13865 2165 1606 90 100 6862 320 18485 6410 500 1920 97 97 9.9 52 37219 100 Yes 8830 27315
5 Duke University Yes 13789 3893 1583 90 98 6188 53 18590 5950 625 1162 95 96 5.0 44 27206 97 Yes 7737 26327
6 Princeton University Yes 13218 2042 1153 90 98 4540 146 19900 5910 675 1575 91 96 8.4 54 28320 99 Yes 8160 28060
7 Georgia Institute of Technology No 7837 4527 2276 89 99 8528 654 6489 4438 795 1164 92 92 19.3 33 11271 70 Yes 6397 12886
8 Dartmouth College Yes 8587 2273 1087 87 99 3918 32 19545 6070 550 1100 95 99 4.7 49 29619 98 Yes 7720 27265
9 Brown University Yes 12586 3239 1462 87 95 5643 349 19528 5926 720 1100 99 100 7.6 39 20440 97 Yes 7746 27274
10 Pepperdine University Yes 3821 2037 680 86 96 2488 625 18200 6770 500 700 95 98 11.6 13 16185 66 Yes 7970 26170
11 University of California at Irvine No 15698 10775 2478 85 100 12677 864 12024 5302 790 1818 96 96 16.1 11 15934 66 Yes 7910 19934
12 University of Pennsylvania Yes 12394 5232 2464 85 100 9205 531 17020 7270 500 1544 95 96 6.3 38 25765 93 Yes 9314 26334
13 Northwestern University Yes 12289 5200 1902 85 98 7450 45 16404 5520 759 1585 96 100 6.8 25 26385 92 Yes 7864 24268
14 Amherst College Yes 4302 992 418 83 96 1593 5 19760 5300 660 1598 93 98 8.4 63 21424 100 Yes 7558 27318
15 Williams College Yes 4186 1245 526 81 96 1988 29 19629 5790 500 1200 94 99 9.0 64 22014 99 Yes 7490 27119
16 Wellesley College Yes 2895 1249 579 80 96 2195 156 18345 5995 500 700 94 98 10.6 51 21409 91 Yes 7195 25540
17 University of Notre Dame Yes 7700 3700 1906 79 96 7671 30 16850 4400 600 1350 96 92 13.1 46 13936 97 Yes 6350 23200
18 Columbia University Yes 6756 1930 871 78 96 3376 55 18624 6664 550 300 97 98 5.9 21 30639 99 Yes 7514 26138
19 Davidson College Yes 2373 956 452 77 96 1601 6 17295 5070 600 1011 95 97 12.0 46 17581 94 Yes 6681 23976
'Elite' : College/Universities that have >50% proportion of Top10perc students.
'Top10perc' : % of new students from top 10% of their High School class

Observations:
- There are 78 (10% of 777) 'Elite' institutions.
- The distribution of every variable is different in 'Elite' colleges when compared with the variable's overall distribution, except in the case of 'Books' and 'Personal'.

Top 20 Elite:
- Unsurprisingly, the top 'Elite' institutions also have the highest proportion of students that graduated in the top 25% of their high schools. All 20 are among the top 97th percentile of Top25perc.
- Min 'Phd' and 'Terminal' proportions are 91% and 92% respectively.
- 19 out of 20 institutions have faculty with 'PhD's within the top 90th percentile.
- 18 out of 20 institutions have faculty with 'Terminal' degrees within the top 90th percentile.
- 16 of the 20 have out-of-state tuition among the top 90th percentile, with California-Irvine and California-Berkely being the notable outliers with 69th and 66th percentile respectively, and Georgia Institute of Technology being an extreme outlier with 16th percentile.
- 70% of the Top 20 Elite have Room.Board expenses among the top 85th percentile.
- Student-faculty ratio is generally lower than overall, with 14 of the top 20 having S.F.Ratio below the 25th percentile.
  Here again California-Irvine, California-Berkely and Georgetown Institute of Technology stand out with 73th, 71st and 91st percentiles respectively.
- 15 of the 20 are among the top 80th percentile in terms of proportion of alumni that donate (perc.alumni).
- Graduation rates are higher than the norm among the Elite institutions with 16 of the top 20 'Elite' having Grad.Rates above the 93rd percentile.
#'################ workings
Percentile table (single variable)
In [68]:
Out[68]:
'Top 20 elite colleges'
< 90 :  1 (5.0)
>= 90 :  19 (95.0)
Out[68]:
College PhD Percentile Percentile_Main
1 Harvey Mudd College 100 99.87 99.87
0 Massachusetts Institute of Technology 99 99.49 100
9 Brown University 99 99.49 98.97
4 Harvard University 97 98.46 99.49
18 Columbia University 97 98.46 97.68
3 Yale University 96 97.55 99.87
11 University of California at Irvine 96 97.55 98.58
13 Northwestern University 96 97.55 98.58
17 University of Notre Dame 96 97.55 97.81
5 Duke University 95 96.53 99.49
8 Dartmouth College 95 96.53 98.97
10 Pepperdine University 95 96.53 98.71
12 University of Pennsylvania 95 96.53 98.58
19 Davidson College 95 96.53 97.55
15 Williams College 94 94.47 98.07
16 Wellesley College 94 94.47 97.94
2 University of California at Berkeley 93 93.18 99.87
14 Amherst College 93 93.18 98.2
7 Georgia Institute of Technology 92 90.99 99.1
6 Princeton University 91 89.45 99.49
In [69]:
In [70]:
<ipython-input-70-0d67190b2fe9>:5: UserWarning: p-value floored: true value smaller than 0.001
  anderson_ksamp([base_df[var], subset_df[var]], midrank=True)
Out[70]:
Anderson_ksampResult(statistic=47.28847627549584, critical_values=array([0.325, 1.226, 1.961, 2.718, 3.752, 4.592, 6.546]), significance_level=0.001)
Out[70]:
KstestResult(statistic=0.8725868725868726, pvalue=1.1102230246251565e-16)
In [71]:
#'#####################################################################
Boxplots
In [72]:
#'#####################################################################
Statistical test
In [73]:
Out[73]:
KS (p-value) AD (min sig lvl) Significant
Apps 0.00000 0.00100 Y
Accept 0.00073 0.00119 Y
Enroll 0.00045 0.00100 Y
Top10perc 0.00000 0.00100 Y
Top25perc 0.00000 0.00100 Y
F.Undergrad 0.00113 0.00100 Y
P.Undergrad 0.01925 0.00297 Y
Outstate 0.00000 0.00100 Y
Room.Board 0.00000 0.00100 Y
Books 0.01071 0.00100 Y
Personal 0.44531 0.25000 N
PhD 0.00000 0.00100 Y
Terminal 0.00000 0.00100 Y
S.F.Ratio 0.00001 0.00100 Y
perc.alumni 0.00001 0.00100 Y
Expend 0.00000 0.00100 Y
Grad.Rate 0.00000 0.00100 Y
TExp.without.O 0.00000 0.00100 Y
TExp.with.O 0.00000 0.00100 Y
#'#####################################################################
Correlation matrix - Subset correlation and change in correlation
In [74]:
#'################ workings'
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Short description of variables

Statistics for a large number of US Colleges from the 1995 issue of US News and World Report.
Return to Index

Private : Public/private indicator
Apps : Number of applications received
Accept : Number of applicants accepted
Enroll : Number of new students enrolled
Top10perc : New students from top 10 % of high school class
Top25perc : New students from top 25 % of high school class
F.Undergrad : Number of full-time undergraduates
P.Undergrad : Number of part-time undergraduates
Outstate : Out-of-state tuition
Room.Board : Room and board costs
Books : Estimated book costs
Personal : Estimated personal spending
PhD : Percent of faculty with Ph.D.’s
Terminal : Percent of faculty with terminal degree
S.F.Ratio : Student/faculty ratio
perc.alumni : Percent of alumni who donate
Expend : Instructional expenditure per student
Grad.Rate : Graduation rate